Data Science Jobs and Salary Analysis And Visualization

Introduction

We did a survey based questionnaire among our colleagues from the work & university, as we have data science and business analyst roles in different departments and then based on that data we gathered we implemented it to the original data and interpreted the impacts of our observations to the original results. First, we find a simple study based on small-sample survey data. Then do the following: Reproduce the original findings using the original data. Collect new observations around 70. Replicate the results with the extended sample. Present our findings and discuss them

Data Description

Observation: The dataset consists of 12 columns and 607 rows. Source: https://www.kaggle.com/code/manishwahale/data-science-job-salaries/data Unnamed: 0 work_year experience_level employment_type job_title salary salary_currency salaryinusd employee_residence remote_ratio company_location company_size Dataset consists of 11 features. work_year: The year the salary was paid. experience_level: The experience level in the job during the year. EN (Entry-level) / Junior MI (Mid-level) / Intermediate SE (Senior-level) / Expert EX (Expert-level) / Director employment_type: The type of employment for the role. PT (Part-time) FT (Full-time) CT (Contact) FL (Freelance) job_title: The role worked during the year. salary: The total gross salary amount paid. salary_currency: The currency of the salary paid as an ISO 4217 currency code. USD (United States Dollar) EUR (Euro) GBP (Great Britain Pound) salary_in_usd: The salary in USD(United States Dollars). employee_residence: Employee's primary country of residence in during the work year as ISO 3166 country code. DE (Germany) JP (Japan) GB (Great Britain) remote_ratio: The overall amount of work done remotely. 0 (No remote) 50 (Partially remote) 100 (Fully remote) company_location: The country of the employer's main office or contracting branch as an ISO 3166 country code. DE (Germany) JP (Japan) GB (Great Britain) company_size: The average number of people that worked for the company during the year. S (Small) (< 50) M (Medium) (50 < & > 250) L (Large) (> 250)

Importing Libraries

Before loading our data, first we need to import or install libraries

Loading Data

Original data

Observations added data

The data with additional observations also added to existed data with name "ds_salaries_observations_added" and will be used for comparison of analysis.

In additional observations around 70 new data values added for Poland salaries for data science job positions for the years 2020-2022 from gathered results from different banks that we are working alongside from university colleagues who work as data scientists.

Next, we are presenting data column names.

At this stage as you can see, we are showing frequency of the job type in our data.

Here you can see it in the orginal data.

At this stage as you can see, we are showing frequency of the job type in our data.

Here you can see it in the observation added data.

Dropping salary and salary_currency columns as salary in USD column is available and common for all job positions.

Common salary in USD is common for both datasets and therefore we are dropping salary and salary_currency columns for the data with observations as well

Changing some of the column values for Original data for better understanding

For instance changing in experience level column 'EN' value which indicated entry level to 'Entry Level' to see clearly in results experience levels

Employment type also from short versions to understandable ones and also remote ratio with the same logic

Changing some of the column values for Modified data for better understanding

As datas are the same structured so this is why we are changing column values in observations added data as well

Formatting columns for Original data

Formatting columns for Additional observations added data

Unique Values in each Column

Observing Unique values in each Column in Original data

Observing Unique values in each Column in Observational data

Observational Analysis

For our next analysis, taken value is used to show maxium salary in each Job type.

This is obsevartion for the orginal data

For our next analysis, taken value is used to show maxium salary in each Job type.

This is obsevartion for the observevation added data

Here we analyze the location of the company, the taken value is used to display the maximum number of locations in the country in which the companies are located.

This obsevartion is for the orginal data

As, we can see here, Host country top 3 is taken by USA, UK and Canada.

Here we analyze the location of the company, the taken value is used to display the maximum number of locations in the country in which the companies are located.

This obsevartion is for the observation added data

After adding observation survey from our end, we can see here, Host country top 3, Poland placed 2nd and still 1st is taken by USA and the 3rd place by UK.

At our next presented value, we just wanted to show, the years and how many working position was taken

This obsevartion is for the orginal data

This obsevartion is for the observation added data

From above analysis for work years we see that each year the number of data science jobs were increasing and it will increase in the future as well because of demand that is needed for that type of jobs.

Here, number of the worked position level frequency from our data

Orginal data

Here, number of the worked position level frequency from our data

This is obsevartion for the observation added data

As we can see, after we added our survey data, the number instantly changed. And can see the significant raise of the Executive Level and Mid Level

Once again, the number of the company size type frequency from our data

This is obsevartion for the orginal data

This is obsevartion for the observation added data

As we can see, after we added our survey data, the number changes again.

At this stage, we will be showing results in plots at charts & graphs

Univarient Analysis

Univarient analysis are done via plotly package in which it is possible to make subplots and see a few plots in one output and being able to have general overview about data

Original Data

From the above plot we can learn that:

There are more people who are mid level(35%) and senior level(46%) working.
The top 5 job titles are Data Scientist, Data Engineer, Data Analyst, Machine Learning Engineer and Research Scientist.
Most of the employees have a full time job whereas freelancing is done the least.
The average salary lies between 40K to 160k.
Most of the the companies are mid sized and also the work type in these countris is mostly fully remote.
Most of the companies are located in USA and the employees also mostly reside in USA.

Observations added data

From the above plot we can learn that:

--- There are more people who are mid level(35.9%) and senior level(41.1%) working. In Poland majority of these roles are yet considered new type of work and therefore job roles are mostly entry levels or junior roles and this causes in plot an increase in entry and mid level positions while decrease in senior level ones.
--- The top 5 job titles are Data Scientist, Data Engineer, Data Analyst, Machine Learning Engineer and Research Scientist.
--- Most of the employees have a full time job whereas contract is done the least. For Original data, freelance was least. This is because in Poland freelance is popular so this is why it has some value compare to part time and contract
--- The average salary lies between 40K to 140k.
--- Most of the the companies are mid sized and also the work type in these countris is mostly fully remote. However, while comparing it with original data we can say that in Poland small and large sized companies are more than in others this is why it has a slight increase in those.
--- Most of the companies are located in USA and the employees also mostly reside in USA. Poland placed in 2nd place for residence for employees with value of 71 while GB is 3rd place with value of 47

Violin Distribution Plots

Experience Level vs Salary in USD

Experience level and salary in usd comparison for Original data

From the above we can say that there is linear relationship between experience and salary as more experience means more salary. To be more precise entry level positions salaries are about mostly from 30 to 100k while mid level from 70k to 150k, senior level from 100k to 220k and executive level from 200k to 700k

Experience level and salary in usd comparison with Observations added data

As salary and experience have linear relationship, this is why we won't observe essential impacts in observational data comparison of salary vs experience as well

Job Titles in the years 2020, 2021 and 2022

Comparison of Job titles for the given years in Original Data

From the above plot we clearly see that during these 3 years data Data Scientist, Data Engineer and Data Analyst roles increased twice as each year compare to previous and Machine Learning Engineer roles also lean to increase. All of this indicate that these roles and other ones is demanded a lot from market as we are in the beginning of 21st century which is century of information, data and it makes sense why it is increasing that much in such short periods of time that sharply.

Comparison of Job titles for the given years in Observations added Data

Number of Job Roles in a company with a certain Company Size

Job Roles in a company with company size for Original data

For the top roles that were high enough in numbers which are Data scientist, analyst and engineer are more in middle sized companies compare to large ones with not much difference. These role salaries are higher this is why not many small companies can afford their salaries, and the roles that have less counts and only in large ones such as Machine Learning Manager, Finance Data Analyst are considered complex roles this is why not many people can have combined knowledge so this is why large companies pay more to those people and keep them.

Job Roles in a company with company size for Observations added data

In Poland also the same level of roles distributed in medium and large sized companies and adding Poland observations even increase the number of employees for middle and large sized companies.

Job Title by Salary with Experience Level

Original Data

From above plot we can see that the top 3 roles which mentioned in previous plots Data Scientist, Data Engineer and Data Analyst roles are available in every aspect of experience level from entry to executive one this is why we can see every experience level at those roles and salaries in order to be able to compare which roles give more value.

Observations added data

In observations added data for above we can see small difference regarding to executive level and entry level positions which increased a bit in data science roles compared to original one. And also the roles that are not mostly used business analyst and etc is also popular in Poland so we can see weights on them as well.

Job Title by Salary with Company Size

Original Data

From above we can see that, small companies give salary between 20-100k, middle companies 50-200k, large companies the same as middle ones and even a bit less. This means that middle companies invest in techno roles more than large companies and try to compete in market with their newest techno products.

Observations Added data

CONCLUSION

Majority of employees, work in large and medium sized companies with higher salaries compare to small companies, and main locations for work is US, CANADA and Great Britain which are quite developed countries in terms of technology.

As an employee residence, Poland is also preferable alongside top 3 countries because of accessibility of getting jobs faster than other countries and also for these roles salaries also paid well.

Thank You!!